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CONVERGENCE RATES FOR BAYESIAN DENSITY ESTIMATION 
OF INFINITE-DIMENSIONAL EXPONENTIAL FAMILIES 

By Catia Scricciolo 

University "L. Bocconi", Milan 

We study the rate of convergence of posterior distributions in den- 
sity estimation problems for log-densities in periodic Sobolev classes 
characterized by a smoothness parameter p. The posterior expected 
density provides a nonparametric estimation procedure attaining the 
optimal minimax rate of convergence under Hellinger loss if the pos- 
terior distribution achieves the optimal rate over certain uniformity 
classes. A prior on the density class of interest is induced by a prior 
on the coefHcients of the trigonometric series expansion of the log- 
density. We show that when p is known, the posterior distribution of 
a Gaussian prior achieves the optimal rate provided the prior vari- 
ances die off sufficiently rapidly. For a mixture of normal distribu- 
tions, the mixing weights on the dimension of the exponential family 
are assumed to be bounded below by an exponentially decreasing se- 
quence. To avoid the use of infinite bases, we develop priors that cut 
off the series at a sample-size-dependent truncation point. When the 
degree of smoothness is unknown, a finite mixture of normal priors 
indexed by the smoothness parameter, which is also assigned a prior, 
produces the best rate. A rate-adaptive estimator is derived. 

1. Introduction. Bayesian nonparametrics is a very rapidly developing 
area of statistics. Several papers — including [1, 2, 4, 11, 14, 15, 16, 17, 18, 20, 
21, 24, 25, 26] — have been devoted to the investigation of asymptotic prop- 
erties of posterior distributions on infinite-dimensional parameter spaces. 

The problem of estimating a density function /o w.r.t. the Lebesgue mea- 
sure A on the unit interval, given a sample of i.i.d. observations Xi, . . . ,Xn 
from /o, is considered from the Bayesian perspective. Suppose that the sam- 
pling probability measure Pq lies in a class of absolutely continuous 
probability measures w.r.t. A, equipped with the Hellinger metric dn, the L2- 
distance between square-rooted densities. Suppose, further, that the generic 
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density is of the form 

with 6 a square-integrable function in a periodic Sobolev class. We recall 
that for any given integer p>l and real L > 0, the Sobolev functional class 
W{p, L) comprises all square-integrable functions with absolutely continuous 
derivative of order p — 1 and pth. derivative bounded in L2-norm, 

W{p,L) = {9 e L2[0,l]:e^P~^^ isabs. cont., H^^^) ||^ < L^}. 

The periodic Sobolev class W^^^ (p, L) is the following subclass of all periodic 
functions with period 1 satisfying the boundary conditions indicated: 

L) = {ee W{p, L) : 0^(0) = ^('■^(l), r = 0, . . . ,p - 1}. 

The problem is made discrete by representing a periodic Sobolev class as a 
Sobolev ellipsoid of ^2 via the trigonometric series expansion. Let = 
0, 1, . . .} be the orthonormal trigonometric system of ^2(0, 1]. For x E [0, 1], 
(j)Q{x) = 1 and for k>l, cl)2k-i{x) = \/2sin(27r/i;a;), (t)2k{x) = \/2cos(27r/ca::). 
For 9 G L2[0, 1], let 9j = /q^ 9{x)cj)j{x) dx, j > 0, be the sequence of its Fourier 
coefficients. To ease the notation, let 6 = {6o,9i, . . .), 0(-) = ((/)o(-), (/'i(-), . . .) 
and e ■ 0(-) = ET=o^j<Pji-) = ^0 + ET=i^j'Pji-)- Each 9 having the series 
expansion 

00 

9{x) = e- ct){x) = 9o + V2j2 [^2fc-i sin(27rA;x) + 92k cos(27r fex)] , x £ [0, 1] , 

k=l 

lies in W^'^'^ {p, L) if and only if 6 belongs to the Sobolev ellipsoid of £2 5 

(00 j2 



with 



„ f J + 1, for J odd, . , „ 

fo = 0, v^ = {-'. , •'. J = 1,2,.... 

■' [j, for J even, •' 

For Q = 00, the Sobolev space {0 G i2-ET=o^T^j < "^^^^ denoted 
by Ep. Setting V'(^) = log{J^ exp {6 ■ 4>{t)} dt) , the generic density can be 
rewritten as 

fe{x) = exp{0 • 0(x) - m} = rl''''^^^Z'Tfu^\.u ^ ^ ^ t^' 

The form of fe explains why which will also denote the density class 
{/oi ^ £ Ep{Q)}, is called an infinite-dimensional exponential family. Since 
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fe does not depend on 9q, for any pair 6,6' G Ep{Q), the corresponding 
probability measures Pq , Pg' are such that Pg / Pg' if and only if 9j ^ 6j for 
some j >1. Sequences differing only in the first coordinate identify the same 
probability measure and, thus, form an equivalence class. For instance, for 
/o G any 6q G Ep{Q) such that fg^ = Jq can be taken as a representative 
of the class. It is now useful to highlight the fact that the /^'s are uniformly 
bounded and bounded away from zero. Let ||</'j||oo = supQ<2.<]^ I'/'jl^^)! be the 
supremum norm of ^j, j > 0. Note that ||0j||oo = for all j > 1. Setting 



-2p 



B = y/2QA, 



for each 6 £ Ep{Q), we have 



\e-4>- e'olloo < V2J2 < ^ 



-2p 



<B <oo. 



Consequently, sup0g^^(-Q) H/elloo < e^^ ■ Thus, the Hellinger distance be- 
tween any pair Pg,, Pg G ^, dY^{Pg,,Pg) = {/q (/^ - ^?d\Y/^, the 
Kullback-Leibler divergence K{Pg, \\Pg) = K{fg. \\fg) = fg, \og{fg,/fg) d\ 
and the L2-distance \\fg' — /ell 2 are equivalent and can be interchangeably 
used as loss functions. 

The problem of estimating densities from exponential families has been 
studied by Grain [7, 8, 9, 10] from the frequentist perspective, where log- 
densities are generated by Legendre polynomials on [—1,1]. Verdinelli and 
Wasserman [21] have used the same model for Bayesian goodness-of-fit test- 
ing. Our goal is to develop Bayesian density estimators attaining the optimal 
rate of convergence in the minimax sense under Hellinger loss, which is well 
known to be n~P^^'^'P~^^^ (see, for example, [25], Corollary 1, page 1574), 



mf supE^[K{fg\\f)] 



.inf sup Eg[4(/0,/)] 



inf snpE-g[\\fg 



/Ill] 



: n 



-2p/{2p+l) 



where is the set of all estimators / for densities fg in ^ based on n 
observations and the expectation is taken over the n-fold product measure 
of Pg. By writing an^hn, we mean that both < bn and bn ^ an, where 
(^n ^ bn if ttn = 0{bn), namely, if there exists a constant c such that a„ < cbn 
for all large n. Hereafter, all symbols O and o will refer to asymptotics as 
n — > 00. The posterior expected density, which will be referred to as the 
Bayes' estimator and denoted by /„ in what follows, is a natural and com- 
mon procedure for density estimation. From the general theory concerning 
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posterior rates of convergence, it is known that if the posterior distribution 
on ^ converges at the exponential rate e"*^"^", where e„ is a positive se- 
quence such that e„ ^ and ne\ ^ oo as n — > oo, then the Bayes' estimator 
converges to the true density /o in the Helhnger distance at least as fast as e„ 
(see, e.g., [14], pages 506-507). Therefore, it suffices to put priors on ^ such 
that the corresponding posterior distributions converge exponentially fast 
at the optimal rate n'^/^^^"*"^-*. Recall that for Pq G ^ ^ if is a prior on 
^ possibly depending on the sample size, the posterior converges at rate 
(relative to (in) if for every positive sequence M„ — > oo such that M„e„ — > 0, 
n„(//^^(Po)|^i5 • • • J ^n) ^ as n — > oo, in probability or almost surely when 
sampling from Pq; where H^^{Pq) = {Pq G ^ -.duiPajPe) > Af„e„}. Since 
any prior on Ep{Q) induces a prior on ^ via the map fg, we can con- 
veniently work with priors for the Fourier coefficients. Hereafter, we state a 
sufficient condition for posterior convergence at the optimal rate. The proof, 
deferred to the Appendix, relies on the fact that, in the present setting, 
Hellinger neighborhoods of Pq translate into ^2-neighborhoods of Oq. 

Theorem 1. Let Tin be a sequence of priors on Ep{Q) and n.„ the se- 
quence of priors induced on Suppose 6q G Ep{Q). Let Bf = e~^^ and 
En = n"^/*^^^"'"^). If for constants ci, C2 > 0, 

(1) vr„(^|0:f;(0,-0o,,f <i??4}) >C2e-^^"^", 

then for a sufficiently large constant M > 0, 

n„({P0 : dYi{PQ,Pe) > Me„}|Xi, . . . , X„) ^ as n ^ oo, 
P^ -almost surely, where P^ denotes the infinite product measure of Pq. 

We develop several priors yielding Bayes' estimators that attain the op- 
timal minimax rate. Preliminary ascertainment of consistency is based on 
results by Barron, Schervish and Wasserman [2], Walker and Hjort [23] and 
Walker [22] , who have addressed the issue of consistency of posterior distri- 
butions for infinite-dimensional exponential families generated by orthonor- 
mal systems of bounded basis functions where the 0j's are independent, 
zero-mean normals with variances chosen to ensure that fg is a density with 
prior probability one. Then i^(Poll^) < oo is a sufficient condition for strong 
consistency. 

We begin by considering the case where p is known. In Section 2, we 
show that a sample-size-dependent prior constructed from an infinite prod- 
uct of normals achieves the optimal rate provided the variances decay suf- 
ficiently fast. The corresponding Bayes' estimator attains the minimax rate 
over Sobolev ellipsoids. As shown in Section 3, it is also attained by the 
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posterior expected density arising from a mixture of normals with mixing 
weights on the family dimension k that are bounded below by a sequence 
exponentially decaying in k. Both estimators involve infinitely many basis 
functions. Thus, the need arises to develop priors on finite sets of coefficients. 
This implies truncating the series at a maximum number of components that 
is allowed to grow with sample size. Approximate density estimators are de- 
rived in Section 4. In Section 5, we consider the case where the degree of 
smoothness of /o is unknown. A prior on the smoothness parameter is as- 
signed that has finite support. Normal priors with dimension depending on 
the smoothness parameter are combined into an overall distribution whose 
posterior is seen to converge at the best rate. An adaptive estimator is con- 
structed. Adaptive convergence rates for posterior distributions on infinite- 
dimensional exponential families generated by wavelets with coefficients in 
a Besov space have been studied by Huang [16]. The relationship between 
our work and this article is considered in Section 6, along with some other 
closing remarks. 



2. Priors constructed from infinite normals. A prior for 9 results from 
assuming independent, zero-mean normal coordinates. If we take 6j ~ -/V(0, r?), 
3 > 0, with X^j^o'^j < then the rj's must be specified so that the infinite 
product measure gives positive probability to Ep. Hereafter, we shall use [xj 
{\x\) to mean the greatest (least) integer less (greater) than or equal to x. 
For each n > 1, let e„ = n-P/(2p+i) and define iV„ = [(8Q/(Sf ))i/{2p)^ ^ 
with B\ = e~^^ as before. We omit the subscript 7i in Nn. Let Tq = 0, which 
corresponds to a point mass at zero for the prior of 6*0. Also, let tJ = a^vj'^'^ , 
with 5=^-1-1/2 for j = 1, . . . , A'^, and q = 2p + a, with a> 1/2, for j > A^-|- 1. 
With this choice, 

oo oo 

E2p 2 2Y^ 2p -(2p+l) , 2 2p -(4p+2a) ^ 

j=0 j=l j=N+l 

hence, Z^j^o^j^^j converges almost surely; see (5.13) in [26], page 541. Let 
fj,n denote the sample-size-dependent prior 

1 



l^n{9)=6oxY[ „(p+i/2) '^( ^(p+1/2) 

j=i crVj \avj 



j=N+i crVj \aVj 

where Sq denotes a point mass at zero, (p stands for the standard normal 
density and M°° is the space of sequences of real numbers. Let vr^ be the 
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restriction of to Ep{Q), 

We prove that the posterior of n.„ , the prior induced on ^ by 7r„ , converges 
at optimal rate. Henceforth, we may set = 1 without loss of generality 
because the results of the following theorem and Corollary 1 are not affected 
by the value of cj^ up to constants. 

Theorem 2. I/Oq G Ep{Q), then for a sufficiently large constant M > 0, 
Pq° -almost surely. 

Proof. In virtue of Theorem 1, we only need to show that condition (1) 
is satisfied. Clearly, 

> fin (J^e e EpiQ) :f^{e, - Oojf < Bfeljj . 
We show that for all large n, 

(3) Jn> IJ'niEn), 

where 



with Co a positive constant depending on Oq to be suitably chosen as will 
be prescribed. To prove (3), it suffices to show that for each 9 G En, 

(i) eeEpiQ); 

(ii) ET=ii(^, - eo,jf < Bfel 

We start with (i). Let < 6o < Q be such that Ej^o^j^^oj = Q-So. By 
Schwarz's inequality, 

oo N N 

j=0 j=l j=l 



+ 2, 



N 

„2P 



N oo 



E-f^^.+ E 



i=Af+l 
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<iN + l)^P^ + {Q-5o) 



R2 2 r2 2 

+ 2^(iV + l)2p-g^(Q - 60) + 

Note that if x>0, then for < < x, 

/ 1 \ / 1 \ 2p / 1 \ 

Fix K > 1 and let rii be the smallest n such that 1 < K < (8Q/(Sf e2))i/(2p) ^ 
For n > ni, 

(iV + 1)'^ < f 1 + ^ < 16^- 



Bfel\ ^ K) - Blel' 

Fix < r?o < iVQ - - (^o) and define Co = f6P+^Q/?7§. Let be the 
smallest n such that < ?7o/2. Obviously, ^2 depends on r]Q. For each 

n>n = max{ni, 712}, 



f:->|<| + (Q-<5o) + 2r?oy^ + |<(r/o + V^Q^)^<Q, 

which proves (i). We now turn to (ii). Using the inequality (a + 6)^ < 2{a? + 
6^), since Co > 2, we have 



< 



^^ + 2iV-2PQ<i?24. 



Hence, both (i) and (ii) are satisfied for all n>n. We now find a lower bound 
on fin{En)- By independence of the 9j^s, 

Jn>Pr(^|(^i,...,^^):E(%-0Oj)'<^|) 

X Pr(^|(^^H.„^^+2,...): E < ^}) 

— Jl,n X J2,n- 

Reasoning as in Lemma 4 of Shen and Wasserman [19], page 711, we obtain 
that 

/ ^ R2^2 \ 

(4) Ji,„ > e~(^Q+P+mN2-iP+mp^l^jy2 < 2:^(2iV)2f+i j , 
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where Vi,. . . , Vn are independent, standard normal random variables. The 
probability on the right-hand side of (4) can be bounded below using Stir- 
ling's approximation. For ease of notation, let = Bfe'^/Co and d = p+1/2. 
Then 



/ N \ 

\j=i J 



(^/2)iV/2-lg-7V/2^/^ 

Noting that {^Nf'^il < {16P+'^ Q / Cq) N = r]^N, we obtain that 

where c = 2Q+p + riQ — ^ log(Tyo/2^^^^) > 0. Let us consider J2,n- By Markov's 
inequality, 

Blel^^J BlelN^v-2 

for all large n. Combining lower bounds on Ji^„ and J2,n) we obtain that for 
ci = 2c(8Q/5f and all large n, 

•Jn ^ 'Jl,n ^ 'J2,n ;C ^ ) 

which completes the proof. □ 

Corollary 1. If fn is the Bayes' estimator arising from prior (2), then 
for any < Q' <Q, 

sup Egj4(/0„,A)]xn"2f/(2p+i). 
0oG-Bp(Q') 

Proof. Note that for each Oq G Ep{Q'), choosing < < {^/Q - \/Q'), 
Theorem 2 applies with constants that do not depend on the specific point Oq . 
Thus, as a byproduct of Theorem A.l, for suitable constants M, C, c > 
and sufficiently large n, 

sup E^^[Un{{Pe:dn{Peo,Pe)>Men}\Xi,...,Xn)]<Ce-'''''. 

eo&Ep{Q') 



By Theorem 5 of Shen and Wasserman [19], page 694, 



sup ElldUfeoJn)] < Mhl + 2Ce-^-" < el 
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which, combined with 

el< sup E^eoidUfooJn)], 

yields the assertion. □ 

Remark 1. Corohary 1 shows that prior (2) yields a Bayes' density 
estimator attaining optimal minimax rate over any ellipsoid Ep{Q'), with 
Q' < Q. Theorem 2 and Corollary 1 are of interest because they establish 
that, for the problem under consideration, in contrast to the infinitely many 
normal means problem considered in [26], a sample-size-dependent direct 
Gaussian prior yields a Bayes' estimator attaining optimal minimax rate 
provided the prior variances die off sufficiently rapidly. 

3. Sieve priors. In this section, we consider sieve priors restricted to Ep{Q) . 
Sieve priors have been used by Zhao [26] and Shen and Wasserman [19]. 
The basic idea is to put a prior on the dimension of the exponential fam- 
ily, hereafter denoted by k. Before describing the hierarchical structure 
of a sieve prior, we introduce some more notation. Henceforth, for any 
integer > 1, let 6j\f = {6q, ... ,6^,0,0, .. .) denote a sequence such that 
all but possibly the first + 1 coordinates are equal to zero. Also, let 
EpMQ) = {ON--EjLovfO] < Q}- Clearly, Ep^^iQ) ^ Ep{Q). 

(i) Conditionally on /c > 1 and 0, for each n > 1, the random variables 
Xi, . . . ,Xn are i.i.d., with density 

exp{ELi^j0j(^)} 

(ii) conditionally on k, the sequence 9 has distribution ^u^, which makes 
the coordinates independent and such that 6o = 0, 9j ~ A^(0,'L'^- *^^^"'"'^^)^ 
j = 1, . . . ,k, and 9j is degenerate at for all j > k; 

(iii) the exponential family dimension k has distribution {X{k),k = 1,2,...} 
with A(/c) >^e-1'^ A; > 1, for some A, ^ > 0. 

Let TT denote the restriction of the sieve prior /i = J2T=i ^{k)l^k to Ep{Q), 

(5) n{e) = ^'''ffX^^^ , eeR^, 

IJ.{Ep[Q)) 

where iJ,{Ep{Q)) = Y^'^=i)^{k)nk{Ep^k{Q))- Next, we study the convergence 
rate for the posterior of the prior IT induced by vr on ^ . 

Theorem 3. IfOo G Ep[Q), then for a sufficiently large constant M > 0, 

U{{P0 : du{Po, Pe) > Mn-P/(2p+i)}|Xi, . . . , X„) ^ as n ^ oo, 
Pq'^ -almost surely. 
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Proof. We appeal to Theorem 1. Note that for N = \{2Q/{Bfel)y/^'^P^] , 



j=N+l j=N+l 

Let < 5o < Q be such that Ej^o^?^o,i = Q-5o.U n is sufficiently large 
that (2(5/XB^4)I^/(2p) > I j^j^^ a positive constant such that Bl < 

B^il - ^l- So/Q)y2^P+\ then 



(b) 



N ^ 

< j=i i 



Using (a) and (b), 

>A(iVwf|0JvGi^P,7v(Q):E(^J-^Oj)'+ E C<^?4|) 
\ I i=i j=N+\ J / 

> A(iV);U;v {[on : E(^i - ^0,,)' < B\e^})^ ^ A(iV)/i,„. 

Let Cn = and d = p + 1/2. Noting that (2iV)2'^Cn < {lG'^QBf/Bf)N and 
X]jLi ^^I'^^oj < 2(5A^, by Lemma 4 of Shen and Wasserman [19], page 711, 
and using Stirling's approximation, we obtain that 



T ^ -(2Q+d)Nn-(d+l/2)N ^ f 

^ r(iv/2) Jo 



r(A^/2) 

(Af/2)^/2-ie-^/2V^~ 
where c = 2Q+p + IG'^QBl/Bf - \ \og{2QBl/Bl) > 0. Therefore, 
(6) /„ > A(iV)/i,„ > e-(^+^)^ > e-2{7+^){2Q/B?)^/(^'''.„-^/'' = ^-cne 
with ci = 2(7 + c)(2Q/S^)i/(2p)^ and condition (1) is satisfied. □ 



ESTIMATION OF EXPONENTIAL FAMILIES 



11 



Remark 2. An examination of the proof of Theorem 3 reveals that 
posterior convergence at the optimal rate depends on the assumed tail be- 
havior of the mixing weights, which are bounded below by an exponentially 
decreasing sequence. This requirement is used in (6) to guarantee that e„- 
Hellinger-type neighborhoods of Pq have prior mass at least of the order 
of e-^i"4. 

Corollary 2. If fn is the Bayes' estimator arising from prior (5), then 
for any < Q' <Q, 



sup noldUfeoJn)] 



n 



~2p/(2p+l) 



Proof. It suffices to check that the convergence of the posterior is uni- 
form over Ep{Q'). More formally, for each Oq £ Ep{Q'), Theorem 3 applies, 
with constants depending only on Q and Q' , so that for suitable M, C, 
OO, 

sup [n{{Pe : du{P0„Pe) > Me„}|Xi, . . . , < Ce"^"^' . 

Note that for ^ = Q - > 0, we have ET=o^T^o,j < Q' = Q - 5' for ah 
Oq G Ep{Q'). Thus, Theorem 3 applies with Bl < Bf{l - ^1 - 6' /Q)'^ /2^p+\ 
The assertion then follows via reasoning similar to that used in the proof of 
Corollary 1. □ 

Remark 3. Corollary 2 demonstrates that the Bayes' estimator attains 
the minimax rate of convergence under Hellinger loss over any ellipsoid 
EpiQ'), with Q'<Q. 

4. Sample-size-dependent priors and density estimators. Bayes' estima- 
tors arising from priors (2) and (5) involve infinitely many terms. To avoid 
the use of infinite bases, we define priors supported on exponential fam- 
ilies whose dimension varies with sample size at a carefully chosen rate. 
Let Nn be a sequence of positive integers, to be specified below. To ease 
the notation, we omit the subscript n in A'^. For each n > 1, let //at be 
the prior that makes the coordinates independent and such that = 0, 
0j ~ iV(0, vj^^^^^^), j = l,...,N, and Oj is degenerate at for all j > N. Let 

(7j TTn[On)- 7^ TTyT. , t^ATGM , 

f^N[Ep^N{Q)) 

be the restriction oi to Ep^j\[{Q) and let n„ denote the induced prior on 
= {for,, On G Ep^NiQ)}, where fg^ = e^N-<t^-^{e^) _ 
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Theorem 4. Let N = [(2Q/Si2)i/(2rt„i/(2p+i)^ . //^o G Ep{Q), then for 
a sufficiently large constant M > 0, 

Un{{Pe : dHiPo,Pe) > Mn-f/(2p+i)}|Xi, . . . , X„) ^ as n ^ oo, 
Pq^ -almost surely. 

Proof. The proof of Theorem 3 carries over to this case with simple 
modifications. □ 

Remark 4. The assertion of Theorem 4 also holds true for the truncated 
sieve prior 

(8) ,4e) = ^£=^!MWM«). 

where for each n > 1, jin = Y^k=i ^nik)fJ-k, with Xn{k) > Aie~^^ , k = l,. . . ,N, 
and J2k=i An(A;) = 1. A uniform version of Theorem 4 can be formulated for 
priors (7) and (8) so that the corresponding Bayes' estimators attain mini- 
max rate. 

In the next proposition, approximations for the Bayes' estimators arising 
from priors (7) and (8) are provided. 

Proposition 1. // for given (large) n, Q is such that hn{Ep,n{Q)) — 
1 {^n{Ep^N{Q)) — 1); then the Bayes' estimators arising from priors (7) 
and (8) can he approximated by 

(9) <^i,nexp<^ T^f' ^^[O'l]' 

1^ ^ j=i ^'j + n + 1 J 

and 

(10) C2,.f:A4%n(fc)exp|if %M±:^1 ^e[0,l], 

respectively, where N is defined as in Theorem 4, = n~^^'^^i4>j{Xi) , 

j = l,...,N, pn{k) = n,ti(l + {n + l)r;7('^+' V'/' , k = 1, . . . , N , and Ci,n, 
C2,n stand for the normalizing constants. 

Proof. First, note that for given n, if Q is sufficiently large, then iJ.j\f{Ep^N{Q)) 
1. To see this, observe that since is degenerate at zero, the probability 

Hn{Ep^p^{Q)) is bounded below by the left tail of the chi-square distribution 

with N degrees of freedom. 



Iin{Ep.n{Q)) > f^N 



Q^A. : f: vf'-'e] < = Pr(x^ < Q). 
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0.0 0.4 0.8 0.0 0.4 0.8 

X X 



Fig. 1. True density (solid line) and its approximate Bayes' estimate (9) (dotted line) 
on the left. True density (solid line) and its approximate Bayes' estimate (10) (dotted line) 
on the right. 



Similarly, fin{Ep,N{Q)) > Ek=i ^n{k) Pr(xi <Q)> Pv{x% < Q) because the 
chi-square distribution is stochastically increasing in its degrees of freedom. 
We now derive (9). Setting L„(/0^) = nr=i /6»jv(^j), the posterior expected 
density can be written as 

J^E[feAs)LMe.)]ds' 

where E stands for expectation under prior (7). Since ^n{Ep^n{Q)) — 1, fJ-N 
can be thought of as a prior on and 

¥.[fe^{x)Ln{fe^)]~ I fer,{x)Lnif0^)fiN{deN), xe[0,l]. 

Since n is large, we can proceed as in Corollary 1 of Lenk [17], pages 534- 
535 (see also pages 541-542), and approximate e^"^^^''''^^^) using the CLT. 
Straightforward computations then lead to (9). Approximation (10) may be 
proved similarly. □ 

Remark 5. The number of terms N = 0(71^/(2^+^)) used in (9) and (10) 
is of the same order as the dimension, say A'^*, of the exponential family em- 
ployed to define the density estimator proposed by Barron and Sheu [3], 
when the log-density is in the periodic Sobolev space W^'^'^{p,oo). Such an 
estimator, say /, is defined to maximize the likelihood in the A* -dimensional 
exponential family and is shown to converge to /o in the sense of rela- 
tive entropy (Kullback-Leibler divergence) at rate O p{n~^P^^'^'P~^^^) , that is, 
i^(/o||/) = Op(n"V{2p+i)). 
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The plots in Figure 1 show approximate Bayes' estimates (9) on the left- 
hand side and (10) on the right-hand side for the density function 

exp{sin(.x)} ^^^^^^^^ 



Jq exp {sin(7rt)} dt 

based on n = 500 observations. We took p = 2, N = 0{n^^^) and A„(A;) oc 
e~'^^, with 7 = 0.1. Both estimates, which appear very similar, are close to 
the true density. 

5. Rate adaptation. Thus far, we have assumed that the degree of smooth- 
ness, p, of /o is known. We now suppose that this is unknown and denote 
its value by po. In accordance with the Bayesian approach, we may con- 
sider p as a hyperparameter and assign it a prior distribution. Let P = 
{prn, ■ ■ ■ ,P-i,Po,Pi, ■ ■ ■ ,Pm} be a finite set of possible values for p, with 
1 < Pm < ■ ■ ■ < P-i <po <pi < ■ ■ ■ < Pm- Let M = {m, . . . , —1,0, 1, . . . ,m} be 
the corresponding index set. For any m S M, let = \n^/^'^P'^~^^^~\ , where 
the subscript m is introduced to stress the dependence on pm- We consider 
the following hierarchical prior. For each n > 1, 

(i) conditionally on p = Pm and ^7 the random variables are 
i.i.d. with density 

„ , . exp{E,t'l^,'^,(^)} 

/oexp{Ef=iM.(«)}rf^ 

(ii) conditionally on p = Pm, ^ has distribution /j-n^, which makes the 
coordinates independent and such that 6*0 = 0, 9j ~ N{0,Vj (2^™+^))^ 
j = 1, . . . , Nrn, and 6j is degenerate at for all j > Nm] 

(iii) p has distribution w{m) = Pr(p = pm) > for all m G M. 

The overall prior is vr^ = I]meM'^("^)/"A^m- Let n„ be the induced prior 
on UmeM{/0]v^) ^ 1^°°}- Our goal is to show that this mixture prior 

achieves the rate of convergence n~P"/^'^P'^~^^^ whenever 6q G Ep^^, with po G 
P. We need to introduce further notation. For each j >1, let Eo[i;^j(Xi)] 
and Yo[(pj{Xi)] be the expected value and variance of (j)j{Xi) w.r.t. Pq, 
respectively. Note that Eo[(j)j{Xi)] < ^/2 and Yo[(j)j{Xi)] < 2 for all j > 1. 
The conditions 

oo 

(11) J2^f%Eo[MXi)]?<oo, 

oo 

(12) 5]Vo[</),(Xi)]<oo 
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are assumed to be in force in what follows. We are now in a position to state 
the main result of this section. 

Theorem 5. Suppose po G P. I/Oq € Ep^^ satisfies conditions (11) and (12), 
then for a sufficiently large constant M > 0, 

n„({P0 : dn{Po,Pe) > Mn~P"/^^P°+'^}\Xi,. . . , X„) ^ 

in Pq -probability as n — > oo. 

Proof. The idea is to show that the posterior mass will ultimately 
be lying in a Sobolev ellipsoid. This will drastically reduce the effective 
parameter space, allowing us to apply the theory developed above. Let = 
72-Po/(2po+i)_ Define w{m) = w{m)/J2i>o w{l), ioi m = 0, . . . ,rn, and let Wn = 
J2m.>oMm)fJ-Nm- For any Q > 0, 

Un = UniiPe : dn{Po,Pe) > Me„}|Xi 

= 7r„({6> : du{Po,Pe) > Me„}|Xi, . . . ,X„ 

= Pii{e:dniPo,Pe)>M£n}, P<Po\Xi 



No 



+ Pr N : ^ vf^'e] > Q, dniPo,Pe) > Me^ \, P>Po 



j=0 

No 



X, 



Pr 



No ^ 

e : J2 ^T^) < ^' dRiPo,Pe) >MeA, p> Po 
j=o ) 



Xl , . . . , Xn 



<Pr(p<po|^i,--.,^n) 

/ r No 



e:Y,vf'e]>Q 



j=0 
No 



X, 



, Xn 



No ~\ 
e : ^ vf°e] < Q, dn{Po,Pe) > Men 
j=0 J 



Xl , . . . , Xn 



(r) P P 

If C/A — > for r = 1, 2, 3, then Un 0, where all 'in probability' statements 



^0 for r = 1,2,3, 

are understood to be w.r.t. Pq. The proof is split into three main steps. 

(1) P 

We begin by showing that C/A — > 0, namely, that the posterior probability 
of selecting a model coarser than the best one tends to zero in probability. 

Note that if Po = 1, then ui^^ = Fr{p < l|Xi, . . . , X„) = a.s. [Pq"] for ah 
n > 1. For Po ^ 2, since w{m) > for all m £ M, 



^(1) 



< 



1 



w{0) 



J2 

m<0 



m] 



jmife^^iX^)mmidO 



Nrr 



IIli=ifeNo{Xi)lJ-No{dONo) 
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where the set of integration is understood to be the whole domain. Let 
Since P is a finite set, for some m* < 0, 

tt(1) 1 / \ ^ ^^^^m.<0 Rm,n R-m* ,n a 

' < — 77TT w{m)Rni^n < Jjr. = —77^ = —77^- 

w{0)^^^ w{0) w{0) w{0) 

It suffices to show that Sn = op{l). Using the approximation e'"'^^^^™-* ~ 

g 2 Z^j=i j> which is vahd for all m G M and where a„ ~ 6„ means that 
o-n/bn ^ 1 as n — > 00, we obtain that 5„ = + op(l), with 

-(2po+l) , 1/2 



n 



l_^^-{2p„+l) 



i=l ^™ 

X n (1+ 



exp 



No 



nv, 



-{2p„+l)^-l/2 



exp 



5 E 

j=7Vo+l 



j=JVo+l 

where, for simplicity, we have written m instead of m* and where for m < 0, 

6,-„ = n2[(n + t;f™+^)-^-(n + 7;f«+^)-V0, l<j<iVo, n>l. 
For later use, note that 



(13) 



bj,n < 



2po+l 



n. 



for n > 1, 

for 1 < j < iVo. 



< 



Recalling the definition of (j)j in Proposition 1, from the inequalities 
2(0j -Eo[</'j(Xi)])2 + 2(Eo[</'j(Xi)])2, for ah j > 1, and x(l + < log(l + 
x) < X, valid for all x > —1, it follows that 



1 



rn<exp<^ 



^-(2po+l) _ ^-(2p,„+l) 



j=l L 



-(2p™+l) 



+ 26,-„,(Eo[(Ai(Xi)])^ 



exp^6,-„(,^,--Eo[</.,-(Xi)])^ 



1 



N„ 



X exp-^ 



2 S 

j=Afo+l 



1 



2n2(Eo[<Ai(^i)])' 



l + „-i^;2p™+i 



2pm + l 



[j=No+l 



n2(0, -Eq (Xi)])- 



A ^(1) ^(2) ^(3) ^(4) 

J„ -^-t„ AJ.„ -^-tn 
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If Ylg^iTn^ = op(l), then T„ = op(l) and, consequently, 5„ = op(l). We 
prove that T^,^^ =o(l). Let L>o = max{l,E°^i?;f ''(Eo[0j(Xi)])2}. Clearly, 
-Do < oo due to (11). Let no be the smallest n such that Nq > 2. For n>nQ 
and 1 < fcn < -^0 to be specified shortly, recalling (13), we have 

Afo k„ 

j=i j=i 

No 
j=k„+l 

No 

j = kn + l 

Taking kn = [(128L'o)~^"'^^^^*''''^^''l , for n > max{?io> "'^iji with ni the small- 
est n such that (1281)0)"^^^/^^^""*"^^ > 2, we have that Vk^D^ < J^n^/^^po+i) 
and nt;^^^^" < (128L>o)^^°nV(2po+i). Yot n > max{no, ni, 7^2}, with n2 the 
smallest n such that ^?°(IEo[0j(^i)])^ < ^(128L»o)-2po, we obtain 

that E^^i&i,n(Eo[</'j(Xi)])2 < ^ni/(2po+i). Now, note that for m < 0, 



^-{2po+l) _ ^-(2p™+l) ^ 



0, fori>l, 

_ . (2p„+i)^ for j>J= p2i/[2bo-P-i)]] 



so that 



^ TV,, ^-(2po+l) _ ^7(2p™+l) ^ L„l/(2P0+l)-lJ ^--(2p„+l) 



_V-^ ^ <-- V 



1 n ^ + f^- * j= J n + f J 



(2p™+l) ' 



For 2 < [,2i/{2po+i) _ ij^ we have n"^ < w^- Also, for n > = 

[(2(J + l))2po+i], we have J + 1 < in^/^^po+i)^ -pj-^^g^ fo^ „ > max{no,ni, 
'T'2 5''^3}) combining previous facts, we obtain that 

L„1/(2P0 + 1)-1J --(2p„+l) >, 

0<rW<exp -- y + _LnV(2po+i) I 

< exp(--(Lni/(2po+i) - IJ - (J - 1)) + lni/(2po+i)| 
[, 8 32 J 

<exp(--nV(2po+i) 
I 32 
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(1) (2) P 

Hence, TA^ as n — > oo. We claim that TA — > 1. For any r] > 0, by 
Markov's inequality, 

By the reverse of Fatou's lemma, the right-hand side goes to zero as n — > oo. 
To see this, let denote the counting measure on N, endowed with the 
cr-field ^{N) of all subsets of N. For each n > 1, letting 

fn{s) = n'^bs,n'^o[(t>s{Xl)]I{i^,„^No}{s), « > 1, 

we can write E^i '^~^^i,nVo[0j (Xi)] = /j^ /„(s)/i(ds). Note that {/„(•),"" = 
1,2,.. .} is a sequence of nonnegative, ^(N)-measurable functions such that 
for every n > 1, 

fn{s) <Yo[MXl)], S>1, 

with J^^Yo[MXi)Mds)=ET=i^o[4>j{Xi)] < oo, due to (12), and 

lim /„(s) = 0, s > 1. 

n— >oo 

Then, 

hmsup / fn{s)fj.{ds) < / limsup/„(s);u((is) = 0. 

n— >oo jN n— >oo 

Therefore, Z^j^i bj,n{4>j — ^o[4'ji^i)])'^ ~^ and by the continuous mapping 
theorem (CMT), Tn^^ 1. We show that Tn^^ = o(l). Let be the smahest 
n such that Y.T=No+i'"T° i^o[<Pj{Xi)]f < 4-(po+2) and ng the smahest n 
such that for m < 0, (Nm/No — 1) > 1. Then for n > maxjns, 714, 715}, 

Thus, T^^-* — > as n — > 00. We prove that T,!^^ = Op(l). For any r] > 0, by 
Markov's inequality, 

\j=No+l n + Vj I 'I j=No+l 

where the right-hand side goes to zero as n — > 00. By the CMT, T^^^ ^ 1. 

p (1) p 

Combining all previous results, T„ ^ 0, hence, ?7A ^ 0. 
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The second step consists in showing that for a sufficiently large Q, the 
posterior probability of {0 :J2flo'^'j^''^j — Qi' under the reduced prior 7f„, 
is asymptotically negligible in probability. Given any r] > 0, by Markov's 
inequality, 



IT 



' No -] 

- j=o J 



, • • • , Xn 



1 A^o 



where, for j = 1, . . . , A'o, 



E[9]\Xi, ...,Xn]=Y. w{m\Xu. ■ . , XnMe]\p = p™, Xi, . . . , 

m>0 



Note that conditionally on p = pm , m£ 



Emp=p^,x,,...,Xn]< 



1 



+ 



Mi)' 



n + Vj 



{n + f . 



Thus, for j = 1, . . . ,iVo, 



MI' 



Since n^E^ [(</>,• )2] = nVo [<Ai (^i )] + (Eq (Xi )] )2 , j > 1 , and Efii v f + 



2po+i)-i ^22po+i, we have 



No 



m2P0 



I 2po+l 

n + 



+ 



n + f 



2P()+1 



No 3^2po No 

< E W + E °(Eo[</'.(Xi)])' < 3 X 22^'o+i + 

Therefore, the probability PQ{Un > rj) can be made arbitrarily small for 
all large n by choosing sufficiently large Q. Let Q be sufficiently large 
that Y.T=Qvf°9lj < Q. For the same Q, define = {0 ■.Eflo^f"^] < 
duiPo,Pe) > Msn}- In the last step, it remains to be shown that the pos- 
terior distribution of vT^ concentrates on Po-centered Hellinger balls at the 
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(3) 

best rate. Precisely, we prove that Uk ' 0, as n — > oo, a.s. [-Pq^]- Note that 

Ihs, n"=l /0iv„ (.^i)fJ-N^ (dON^) 



( TTl 1 

The numerator of the ratio in the summation on the right-hand side of 
the above inequahty can be bounded above using condition (16), as in the 
proof of Theorem 1. To bound the denominator below, we can use the same 
arguments as in the proof of Theorem 3, replacing N with Nq = ^n^/('^Po+^)~\ 
and taking Bf < min{S^/2, {y/Q - ^Q- 5of /WP"} . Then for n sufficiently 
large that ET=No+i ^T^h ^ ^i/2> we have 

mo (l^jvo G Ep,,No{Q) ■■ fliOj - eo,jf < Bfe{\] 
> mo (\ono : E(% - 0OJ? < Blel]] > e-^^-' 



1 



with ci depending on Qq. Thus, for a suitable constant c > 0, Un ^ g-c^^n 
for all but finitely many n along almost all sample paths when sampling 
from Pq. This completes the proof. □ 

Remark 6. If /o simultaneously has the following series expansions 
/o(x)=/3oXx) = exp{0oXa;)-V(0o)}, [0,1], 

where (3q = (/3o,0) /3o,i) • • •) has coordinates /?o,o = 1 and (3oj = Eo[0j(Xi)] for 
j >!, then condition (11) implies that (3q lies in Ep^. 

Remark 7. Since a finite set P of possible values for p is considered, the 
choice of weights w{m) is not relevant. In the present asymptotic setting, 
any set of positive weights achieves the same result. 

Since the posterior distribution does not converge exponentially fast in 
Theorem 5, the rate of convergence for the posterior expected density cannot 
be derived as easily as in the previous cases. We therefore resort to another 
estimator that is Bayesian in the sense that it is based on the posterior distri- 
bution. The following construction closely follows that in [4], pages 544-545. 
For a positive sequence 5^ — > as n — > 00, let Hs^{Pe) = {Pe' ■ du{Pe,P0') < 
6n} and define 

6: = mi{5n:Un{Hs,SPe)\Xi,...,Xn)>3/A for some Pg}. 

Take any P„ satisfying the following condition: 

n„(F5.+„_i(P„)|Xi,...,X„)>3/4. 
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As subsequently stated, such an estimator, whose definition does not re- 
quire knowledge of po, attains the optimal pointwise rate of convergence 
j2-po/(2po+i)^ adapting to the unknown smoothness of the true density. 

Corollary 3. // the conditions of Theorem 5 are satisfied, then for a 
sufficiently large constant M > 0, P^{dn{Po,Pn) > Mn-Po/(2po+i)) ^ Q as 
n — > oo. 

Proof. See the proof of Theorem 4 in [4], page 545. □ 

6. Closing remarks. This paper focuses on the estimation of densities in 
periodic Sobolev classes. The problem is approached through the use of an 
orthonormal series expansion for the log-density with single priors on the 
coefficients. The posterior expected density is shown to attain the optimal 
minimax rate of convergence under Hellinger loss for several priors. 

As mentioned in Remark 1, an interesting finding of the paper is that 
a sample-size-dependent direct Gaussian prior leads to a Bayes' estimator 
achieving the optimal minimax rate in this problem, in contrast to the in- 
finitely many normal means problem investigated by Zhao [26], who has 
shown that there is no Gaussian prior supported on Ep such that the corre- 
sponding Bayes' estimator attains the optimal minimax rate. Optimality for 
the Bayes' density estimator follows from uniform exponential convergence 
of the posterior distribution over suitable ellipsoids. In the infinitely many 
normal means problem, the rate of convergence for the Bayes' estimator is 
derived directly from the study of the risk function and uniformity holds over 
any Ep{Q) provided the power of the prior variances exactly matches the 
assumed degree of smoothness, which is not the case if the prior is supported 
on Ep. 

Another interesting result concerns adaptation. We have shown that the 
posterior distribution of a sample-size-dependent prior achieves the best 
pointwise rate n~P'^^^'^P°^^\ regardless of the value of po G P, for every 
9q G EpQ satisfying conditions (11) and (12). In a recent paper, Huang [16] 
has obtained results on posterior rates of convergence for density estimation 
using the method of exponentials, with priors on the coefficients of the log- 
density expansion via wavelets, the coefficients lying in a Besov space 
with a G (0,1). This method is suitable for estimating spatially inhomo- 
geneous density functions, while we consider smooth, periodic functions. 
Huang does not put a prior on a, instead she constructs a sieve prior with 
mixing parameter given by the dimension of the exponential family and the 
ball radius. Even though the rate she obtains has an extra (logn)^/^-factor, 
her result is valid for all points in i?2 2- result, although achieving a 
better rate, is restricted to points in Ep^ also satisfying the aforementioned 
conditions. 
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APPENDIX 

Lemma A.l. For any pair 6, 6' £ Ep{Q), 

-| oo 

K{Pe>\\Pe)<-e'^Y.(e'^-e^f. 



(14) 
Consequently, 

1 oo 

(15) dl{Pe>,Pe)\\fe'/fe\\oo < i^e'^'T.i^^', - Ojf . 

^ 3=1 

Proof. We use inequality (3.2) from Lemma 1 of Barron and Sheu [3], 
pages 1355-1356. If || log(/0'//0)||oo < oo, then for any constant c, 



K{Pe'\\P6) < ^elli°s(/«'//«)-^ll-|^'/0,(x)(log 



fe{x) 



c dx. 



e')-e',]\<\\{e-e').^ 



Note that for any pair 6, 6' E Ep{Q) 

me) -00 

thus, 

fe' 



70 - ^o)\\(x> 



<2B, 



log 



fe 



e'- 6) ■(/}] + - H^')] lloo < 45 < oo. 



Take c=[i^{d) - Bq] - [i^ie') - e'o]. Using the fact that supg^E^^Q) \\fe\\oo < 
e^^ and Parseval's relation, we obtain (14). Obviously, the same bound 
holds true for K{Pg\\P0'). A similar remark applies to inequality (15), which 
follows from (14) because d^^{Pg>,Pe) < K{Pg,\\Pe) and \\fe'/fe\\oo<e^^. 
□ 



Theorem A.l below is an almost sure version of Theorem 2.1 in [15], 
page 1239 (see also Theorem 2.2 in [16], page 505). Before stating the theo- 
rem, we recall that if {S, d) is a semi-metric space and C a totally bounded 
subset of S, then for any given e > 0, the e-packing number of C, denoted 
by D{e,C,d), is defined as the largest integer m such that there exists a 
set {si, . . . , Sm} Q C with (i(sfc, s;) > e for all k,l = 1, . . . ,m, k ^ I. The e- 
capacity of (C, d) is defined as logL'(e, C, d). 

Theorem A.l. Let ,0^ be a class of probability measures that possess 
densities relative to some a -finite reference measure u on a sample space 
{S^ ,:S^). Let d stand for either the Li- or the Hellinger metric on Let 
n„ be a sequence of priors on (i^,^), where is the Borel a-field on 
For Pq £ 3^ , let /o denote its density. Suppose that for positive sequences 
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£n,£n with nmin{e^,e^} oo and X^J^Li exp(— £^ne^) < oo for every 
E >0, constants ci, 02,03,04 > and sets S^n ^ ^ , we have 

(16) logD{en,^n,d)<cinel 

(17) n„(^\^„)<C2e-('=^+^)"^'", 

(18) n„(iV(Po;et))>C4e-=^"^", 

where iV(Po;et) = {P:4(Po,^')||/o//p||oo < 4} with fp = dP/du. Then, 
for En = max{e„,e„} and a sufficiently large constant M > 0, the posterior 
probability 

n„({P : d{Po, P) > Men}\Xi,. . . , X„) ^ as n ^ 00, 
Pq^ -almost surely. 

Proof of Theorem 1. We appeal to Theorem A.l and show that the 
conditions hsted earher are satisfied for e„ = e„ = e„ = n~^^^'^P^^\ Condi- 
tion (16) is verified for = ^ ■ It is easily seen that for some constant 
K > {) depending only on p and L, /o (/^^^(x))2 dx < for all £ Ep{Q). 
Besides, for any pair Pqi^Pq G ^ such that d}i{Pg',P0) > e„, a simple calcu- 
lation shows that H/g/ — fe\\oo > \\fe' — /elb > 2e~^en; see [5], page 252, for 
the monotone convergence of the Lg-norm, g > 1, to the essential supremum 
norm w.r.t. A on [0, 1], || • \\q ] \\ ■ Hl^o- Then by a result due to Birman and 
Solomjak [6] (see also [20], pages 22-23), for a suitable constant c > 0, 

log D{en, ^, dn) < log L»(2e-^e„, ^, || • lU) < ce"^^" = cnel 

Condition (17) is trivially verified. Finally, recalling that Bf = e~^^ , condi- 
tion (18) follows from 

<^„({0:4(Po,i^0)ll/o//0||oo<4}) = n„(iv(Po;4)), 

where (1) and (15) have been applied. □ 
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